Tutorial on reading in data and creating some basic maps in the simple features package. The first step here is to load in some packages.
library(sf)
library(dplyr)
library(readr)
library(ggplot2)
library(mapview)
library(lubridate)
Sys.setenv(RSTUDIO_PANDOC="C:/Program Files/RStudio/bin/pandoc")
# set directory from which we read in data
ddir <- "C:/Users/Yixin Sun/Documents/Dropbox/EPIC Predoc Resources/epic_orientation/data/Spatial_crime"
# function that creates custom map style for ggplot
theme_nothing <- function(base_size = 12, title_size = 15){
return(
theme(panel.grid.major = element_line(colour = "transparent"),
panel.grid.minor = element_line(colour = "transparent"),
panel.background = element_blank(), axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(size = title_size, hjust = 0.5),
plot.caption = element_text(size = 6)))
}
sf ObjectOur exapmle will stem from a simple dataset on crimes in the UK. Often times with point objects, the data comes in a table of latitude and longitude, which we coerce into a spatial object. The important thing here is to know which coordinate reference system we are using.
utm_crs <- 27700
# load data in and get rid of spaces in column names
streetCrime <-
read_csv(file.path(ddir, "2012-03-cumbria-street.csv")) %>%
rename(Report_by = `Reported by`,
Falls_within = `Falls within`,
Crime_type = `Crime type`)
## Parsed with column specification:
## cols(
## Month = col_character(),
## `Reported by` = col_character(),
## `Falls within` = col_character(),
## Easting = col_integer(),
## Northing = col_integer(),
## Location = col_character(),
## `Crime type` = col_character(),
## Context = col_character()
## )
# take a look at the data as a dataframe
glimpse(streetCrime)
## Observations: 4,350
## Variables: 8
## $ Month <chr> "2012-03", "2012-03", "2012-03", "2012-03", "2012...
## $ Report_by <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Falls_within <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Easting <int> 351534, 299210, 351183, 301277, 299416, 361096, 3...
## $ Northing <int> 492353, 528515, 492741, 524712, 527779, 478662, 5...
## $ Location <chr> "On or near Dowker'S Lane", "On or near Supermark...
## $ Crime_type <chr> "Burglary", "Burglary", "Burglary", "Burglary", "...
## $ Context <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
# convert from dataframe to sf object
streetCrime <-
streetCrime %>%
st_as_sf(coords = c("Easting", "Northing"), crs = utm_crs)
# take a look at the data as an sf object
glimpse(streetCrime)
## Observations: 4,350
## Variables: 7
## $ Month <chr> "2012-03", "2012-03", "2012-03", "2012-03", "2012...
## $ Report_by <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Falls_within <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Location <chr> "On or near Dowker'S Lane", "On or near Supermark...
## $ Crime_type <chr> "Burglary", "Burglary", "Burglary", "Burglary", "...
## $ Context <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ geometry <POINT [m]> POINT (351534 492353), POINT (299210 528515...
There are several ways to visualize data - base R, mapview, or ggplot2. mapview is great for quickly creating interactive visualizations of your spatial data, and is therefore a convenient way to investigate the geometries (for example, visualizing using mapview is a good way to visualize if you have the right coordinate reference system).
plot(streetCrime$geometry)
# mapview has several types of maps the data can be overlaid on
# check which type of map you'd like here http://leaflet-extras.github.io/leaflet-providers/preview/
mapview(streetCrime, map.types = "OpenStreetMap.DE")
streetCrime %>%
ggplot() +
geom_sf(aes(colour = Crime_type)) +
ggtitle("Street Crime By Type") +
theme_nothing()
UK Crime is organized by neighborhoods. Here, we read in the shapefile of the neighborhoods and aggregate the point data to the neighborhood level. Important to note that the neighborhood polygons are in longitude/latitude while the crime data is in UTM, so we must first convert the data to the same CRS
# read in neighbourhoods shapefile
nh <-
file.path(ddir, "neighbourhoods.shp") %>%
st_read(stringsAsFactors = F) %>%
st_transform(utm_crs)
## Reading layer `neighbourhoods' from data source `C:\Users\Yixin Sun\Documents\Dropbox\EPIC Predoc Resources\epic_orientation\data\Spatial_crime\neighbourhoods.shp' using driver `ESRI Shapefile'
## Simple feature collection with 46 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -3.64057 ymin: 54.03964 xmax: -2.159025 ymax: 55.18898
## epsg (SRID): 4326
## proj4string: +proj=longlat +ellps=WGS84 +no_defs
# plot the polygons with points on top
nh %>%
ggplot() +
geom_sf() +
geom_sf(data = streetCrime) +
ggtitle("Neighbourhood Polygons") +
theme_nothing()
This plain map is not visually appealing, and besides seeing that some parts have more concentrated dots than others, it’s not very informative either. Let’s instead color the neighbourhoods by the number of crimes in each region and create a chloropleth map.
# Join crime data to neighbourhood polyogns
# aggregate total crime in each neighbourhood
nh_crime <-
nh %>%
st_join(streetCrime) %>%
group_by(ID, Name, Population) %>%
summarise(total_crimes = n()) %>%
mutate(crime_per_capita = total_crimes / Population)
# plot chloropleth map - ARRANGE 2 PLOTS SIDE BY SIDE FOR LEVELS AND THEN PER
per_capita_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = crime_per_capita)) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Per Capita Crime")
tot_crime_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = total_crimes)) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Crime in Neighbourhoods")
# use gridExtra package to print maps side by side
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(tot_crime_map, per_capita_map, ncol = 2)
You’ll notice that most of the county is pretty light colored - this is because the distribution is very skew. Try taking a log-transform and mapping that.
log_per_capita_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = log(crime_per_capita))) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Log Crime per Capita")
log_tot_crime_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = log(total_crimes))) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Log Crime")
# use gridExtra package to print maps side by side
grid.arrange(log_tot_crime_map, log_per_capita_map, ncol = 2)
Some neighbourhoods are clearly more rural than others. Therefore, it might be useful to visualize the land types the analysis region. Here, we learn how to read in and extract information from a raster file.
Raster files store information for each pixel. In this case, we have a land use classification at each pixel level. What we want to do is for each neighbourhood polygon, count up the number of pixels in that polygon that fall within each land use classification.
To do this we use 3 files * raster file, for which each pixel is classified using a number, * raster legend file which is a dictionary mapping the numbers to a type of land use classification * neighbourhood polygon file which we want to overlay onto the raster file and count up the number of pixels in that polygon that for each classification
library(raster)
## Loading required package: sp
##
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
##
## select
# read in legend for land use raster file
legend <-
file.path(ddir, "legend.csv") %>%
read_csv()
## Parsed with column specification:
## cols(
## GRID_CODE = col_integer(),
## LABEL1 = col_character(),
## LABEL2 = col_character(),
## LABEL3 = col_character(),
## Colour = col_character()
## )
head(legend)
# read in raster file
landUse <- raster(file.path(ddir, "cumbriaLandUse.tif"))
plot(landUse)
# the extract() function from the raster package extract values of the pixel of a raster object that is covered by a polygon. Let's look at this with 1 polygon
# note that extract currently only works with `sp` objects, so we first
# transform nh to an `sp` object
nh_sp <- as(nh, "Spatial")
extract(landUse, nh_sp[1,])
## Warning in .local(x, y, ...): Transforming SpatialPolygons to the CRS of
## the Raster
## [[1]]
## [1] 18 18 18 18 18 39 18 18 18 39 18 18 3 3 39 18 18
## [18] 3 3 3 3 18 18 3 3 3 3 3 3 3 3 3 3 3
## [35] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [52] 3 3 3 3 3 2 2 3 3 3 3 3 3 2 2 2 2
## [69] 3 3 3 3 3 3 2 2 2 1 2 2 3 3 3 3 3
## [86] 2 2 2 1 1 1 2 2 3 3 3 3 3 2 2 1 1
## [103] 1 1 2 2 3 3 3 2 2 2 1 1 1 1 1 2 2
## [120] 3 3 3 2 2 2 1 1 1 1 1 1 2 3 3 3 2
## [137] 2 2 2 1 1 1 1 1 2 2 3 3 3 3 2 2 2
## [154] 1 1 1 1 1 1 2 2 3 3 3 3 2 2 2 1 1
## [171] 1 1 1 1 1 1 3 3 3 3 3 2 2 1 1 1 1
## [188] 1 1 1 1 2 3 3 3 3 3 3 1 1 1 1 1 1
## [205] 1 1 1 2 2 3 3 5 5 5 5 1 1 1 1 1 1
## [222] 1 1 1 1 2 3 5 5 5 5 5 5 5 1 1 1 1
## [239] 1 1 1 1 1 1 3 3 3 3 5 5 5 5 5 5 5
## [256] 1 1 1 1 1 1 1 1 2 255 3 3 3 3 5 5 5
## [273] 5 5 5 5 1 1 1 5 5 1 2 2 255 3 3 3 3
## [290] 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [307] 255 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5
## [324] 5 5 5 5 5 5 0 3 3 3 3 3 3 3 3 5 5
## [341] 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 255
## [358] 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5
## [375] 5 5 0 0 0 0 0 0 255 3 3 3 3 3 3 3 3
## [392] 3 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0
## [409] 0 255 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5
## [426] 0 0 0 0 0 0 0 0 0 255 3 3 3 3 3 3 3
## [443] 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0
## [460] 0 0 255 3 3 3 3 3 3 5 5 5 5 5 5 5 5
## [477] 5 5 0 0 0 0 0 0 0 0 255 5 5 3 3 5 5
## [494] 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0
## [511] 255 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0
## [528] 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5
## [545] 5 5 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5
## [562] 5 0 0 5 5 5 5 5 5 5 5 5 0 0 5 5 5
## [579] 5 5 5 5 0 255 5 5 5 5 5 5 5 5 5 5 5
## [596] 5 5 39
# extract highest classification type for all polygons
lu_extracted <- extract(landUse, nh_sp)
## Warning in .local(x, y, ...): Transforming SpatialPolygons to the CRS of
## the Raster
# function for counting up land use classification in a polygon,
# and returns the fraction of pixels in each classification
lc_extract <- function(p, id){
lc_temp <-
as_tibble(p) %>%
filter(value != 0) %>%
left_join(legend, by = c("value" = "GRID_CODE")) %>%
group_by(Classification = LABEL2) %>%
summarise(n = n()) %>%
mutate(ID = id,
perc_covered = n / sum(n))
return(lc_temp)
}
# use the functional programming package instead of apply family
# shape the list of classifications into percent of polygon area covered
library(purrr) # use the functional programming package instead of apply family
lu_classified <- map2_df(lu_extracted, nh$ID, lc_extract)
# for our broad classification, we take the classification with the most number
# of pixels
lu_crime <-
lu_classified %>%
group_by(ID) %>%
arrange(-perc_covered) %>%
filter(row_number() == 1) %>%
right_join(nh_crime)
## Joining, by = "ID"
# map land use
lu_crime %>%
ggplot() +
geom_sf(aes(fill = Classification)) +
theme_nothing() +
theme(legend.position = "bottom") +
ggtitle("Land Use of Neighbourhoods") +
guides(fill = guide_legend(nrow=2))
lu_crime %>%
ggplot() +
geom_sf(aes(fill = Classification)) +
theme_nothing() +
ggtitle("Land Use and Crime") +
geom_sf(data = streetCrime, shape = 3) +
theme(legend.position = "none")
Here we are interested in whether crimes are close to major roads, or if there are specific crimes that usually happen near roads. We load in road data, and find distance of crimes to roads. Let’s plot the density and cumulative distribution of distances for violent , burglary, and shoplifting crimes.
roads <-
file.path(ddir, "mainroads.shp") %>%
st_read(stringsAsFactors = F) %>%
st_transform(utm_crs)
## Reading layer `mainroads' from data source `C:\Users\Yixin Sun\Documents\Dropbox\EPIC Predoc Resources\epic_orientation\data\Spatial_crime\mainroads.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3486 features and 3 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -3.602487 ymin: 54.051 xmax: -2.00917 ymax: 55.13869
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
table(roads$type)
##
## motorway motorway_link primary primary_link road
## 198 75 610 9 193
## secondary secondary_link tertiary tertiary_link trunk
## 629 3 945 11 757
## trunk_link
## 56
# keep just major roads
roads <-
roads %>%
filter(type %in% c("motorway", "motorway_link", "trunk", "trunk_link",
"primary", "primary_link"))
# Get distance to closest main road
streetCrime$dRoad <-
st_distance(streetCrime, roads) %>%
apply(1, min)
streetCrime %>%
filter(Crime_type %in% c("Violent crime", "Burglary",
"Shoplifting")) %>%
ggplot() +
geom_density(aes(dRoad, group = Crime_type, colour = Crime_type)) +
theme_minimal()
streetCrime %>%
filter(Crime_type %in% c("Violent crime", "Burglary",
"Shoplifting")) %>%
ggplot() +
stat_ecdf(aes(dRoad, group = Crime_type, colour = Crime_type)) +
theme_minimal()
We also have a data file of the crime counts in neighbourhoods for a number of month. The file has one for each month for each neighbourhood, and lists the number of crimes in each category. Here, we first join this with the neighbourhoods polygons and create some time series plots
allCrime <-
file.path(ddir, "allCrime.csv") %>%
read_csv
## Parsed with column specification:
## cols(
## Month = col_character(),
## Force = col_character(),
## Neighbourhood = col_character(),
## `All crime and ASB` = col_integer(),
## Burglary = col_integer(),
## `Anti-social behaviour` = col_integer(),
## Robbery = col_integer(),
## `Vehicle crime` = col_integer(),
## `Violent crime` = col_integer(),
## `Public Disorder and Weapons` = col_integer(),
## Shoplifting = col_integer(),
## `Criminal damage and Arson` = col_integer(),
## `Other Theft` = col_integer(),
## Drugs = col_integer(),
## `Other crime` = col_integer()
## )
glimpse(allCrime)
## Observations: 322
## Variables: 15
## $ Month <chr> "2011-09", "2011-09", "2011-09",...
## $ Force <chr> "Cumbria Constabulary", "Cumbria...
## $ Neighbourhood <chr> "GARS07", "GARS06", "GARS05", "G...
## $ `All crime and ASB` <int> 41, 141, 56, 82, 119, 90, 360, 1...
## $ Burglary <int> 3, 3, 0, 2, 4, 2, 8, 2, 7, 1, 3,...
## $ `Anti-social behaviour` <int> 18, 39, 36, 50, 56, 53, 194, 3, ...
## $ Robbery <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vehicle crime` <int> 3, 2, 1, 1, 7, 1, 3, 0, 3, 2, 0,...
## $ `Violent crime` <int> 2, 16, 11, 11, 19, 10, 48, 3, 12...
## $ `Public Disorder and Weapons` <int> 1, 2, 0, 3, 2, 2, 17, 0, 1, 1, 1...
## $ Shoplifting <int> 2, 7, 0, 1, 3, 2, 25, 0, 5, 1, 0...
## $ `Criminal damage and Arson` <int> 2, 54, 4, 9, 12, 11, 27, 3, 14, ...
## $ `Other Theft` <int> 7, 11, 4, 4, 8, 4, 24, 0, 7, 7, ...
## $ Drugs <int> 1, 4, 0, 1, 5, 1, 7, 0, 2, 0, 3,...
## $ `Other crime` <int> 2, 3, 0, 0, 3, 4, 7, 0, 1, 2, 0,...
# right now month is in character format - coerce this into better format
# use lubridate package
allCrime <- mutate(allCrime, Month = ymd(paste0(Month, "-01")))
glimpse(allCrime)
## Observations: 322
## Variables: 15
## $ Month <date> 2011-09-01, 2011-09-01, 2011-09...
## $ Force <chr> "Cumbria Constabulary", "Cumbria...
## $ Neighbourhood <chr> "GARS07", "GARS06", "GARS05", "G...
## $ `All crime and ASB` <int> 41, 141, 56, 82, 119, 90, 360, 1...
## $ Burglary <int> 3, 3, 0, 2, 4, 2, 8, 2, 7, 1, 3,...
## $ `Anti-social behaviour` <int> 18, 39, 36, 50, 56, 53, 194, 3, ...
## $ Robbery <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vehicle crime` <int> 3, 2, 1, 1, 7, 1, 3, 0, 3, 2, 0,...
## $ `Violent crime` <int> 2, 16, 11, 11, 19, 10, 48, 3, 12...
## $ `Public Disorder and Weapons` <int> 1, 2, 0, 3, 2, 2, 17, 0, 1, 1, 1...
## $ Shoplifting <int> 2, 7, 0, 1, 3, 2, 25, 0, 5, 1, 0...
## $ `Criminal damage and Arson` <int> 2, 54, 4, 9, 12, 11, 27, 3, 14, ...
## $ `Other Theft` <int> 7, 11, 4, 4, 8, 4, 24, 0, 7, 7, ...
## $ Drugs <int> 1, 4, 0, 1, 5, 1, 7, 0, 2, 0, 3,...
## $ `Other crime` <int> 2, 3, 0, 0, 3, 4, 7, 0, 1, 2, 0,...
# join with neighbourhoods too create a neigbourhoods-month panel
# note that a left_join preserves the class of the first argument, in this case
# it results in an sf dataframe
nh_all <-
nh %>%
left_join(allCrime, by = c(ID = "Neighbourhood"))
# make sure neighbourhood-month is unique
nh_all %>%
group_by(ID, Month) %>%
filter(n() > 1)
## Warning in `[<-.data.frame`(`*tmp*`, is_list, value = list(`18` = "<S3:
## sfc_GEOMETRY>")): replacement element 1 has 1 row to replace 0 rows
# plot just the first 4 months of all crimes
# FORMAT DATES INTO SEPTEMBER 2011 ETC
# FIGURE OUT COLOR GRADIENTs
nh_all %>%
filter(Month <= ymd(20111201)) %>%
mutate(Month = format(Month, "%B")) %>%
ggplot(aes(fill = `All crime and ASB`)) +
geom_sf() +
facet_wrap(~ factor(Month,
levels = c("September", "October", "November", "December")),
nrow = 1) +
scale_fill_continuous(low = "#fff5f0", high = "#67000d") +
theme_nothing() +
theme(legend.position = "bottom")
A fun option is to represent your data as a GIF! The main package for this is animations, although I often use gganimate which is a simple extension on the ggplot2 package. `
library(gganimate)
##
## Attaching package: 'gganimate'
## The following object is masked from 'package:raster':
##
## animate
nh_all %>%
mutate(Month = format(as.Date(nh_all$Month), "%Y-%m")) %>%
ggplot(aes(fill = log(`All crime and ASB`))) +
geom_sf() +
scale_fill_continuous(low = "#fff5f0", high = "#67000d") +
theme_nothing() +
theme(legend.position = "bottom") +
labs(title = "{closest_state}") +
transition_states(Month, 1, 1)
Spatial_palo_alto, read in the palo_alto and palo_alto_freeway shapefiles. Plot both layers onto one mapPrCpInc. Also overlay freeway layer on toppalo_alto, make some maps to show patterns of freeway layouts and different demographics. Do any patterns emerge here?